Development and validation of SSR markers related to flower color based on full-length transcriptome sequencing in Chrysanthemum

Chrysanthemum (Chrysanthemum moriforlium Ramat.) is one of the most popular flowers worldwide, with very high ornamental and economic values. However, the limitations of available DNA molecular markers and the lack of full genomic sequences hinder the study of genetic diversity and the molecular breeding of chrysanthemum. Here, we developed simple sequence repeat (SSR) from the full-length transcriptome sequences of chrysanthemum cultivar ‘Hechengxinghuo’. A total of 11,699 SSRs with mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats were identified, of which eight out of eighteen SSR loci identified based on sixteen transcripts participated in carotenoid metabolism or anthocyanin synthesis were validated as polymorphic SSR markers. These SSRs were used to classify 117 chrysanthemum accessions with different flower colors at the DNA and cDNA levels. The results showed that four SSR markers of carotenoid metabolic pathway divided 117 chrysanthemum accessions into five groups at cDNA level and all purple chrysanthemum accessions were in the group III. Furthermore, the SSR marker CHS-3, LCYE-1 and 3MaT may be related to green color and the PSY-1b marker may be related to yellow color. Overall, our work may be provide a novel method for mining SSR markers associated with specific traits.


Results
Full-length transcriptome sequencing of chrysanthemum. Full-length transcriptome sequencing of the chrysanthemum cultivar 'Hechengxinghuo' was performed based on the SMRT sequencing technology. A total of 8,658,873 (7.57 Gb) clean reads were obtained with a mean clean read length (MCDL) of 1754 (Supplementary Table 2). Additionally, a total of 450,789 reads of insert (ROI) were screened from the raw sequence data, with a mean read length of insert (MRLOI) of 2359. Among the fragment sequences of inserted, the number of full-length non-chimeric reads (NFNR) was 363,653, with a full-length percentage (FLP) of 80.67%, and the average full-length non-chimeric read length (AFNRL) was 2162 (Supplementary Table 2). To determine the function of transcripts, 55,277 unigenes obtained from the transcriptome data were compared with the following protein sequence databases: KOG, KEGG 32,33 , NR, Swiss-Prot, and GO. The results showed that a total of 47,436 unigenes were annotated with an annotation rate of 85.82% (Table 1). Among all databases, the NR database had the highest annotation rate of 84.58% with 46,754 genes, while the KEGG database had 22,528 genes annotated with the lowest annotation rate of 40.75% (Table 1). www.nature.com/scientificreports/

Chrysanthemum SSR primer design and marker validation. To develop SSR markers associated
with flower color traits, we focused on genes participated in flower color formation and regulation. Based on the full-length transcriptome sequences of the chrysanthemum cultivar 'Hechengxinghuo' , 16 transcripts containing SSR loci were selected. Among them, 7 transcripts were involved in carotenoid metabolism, namely IPP, PSY, PDS, ZDS, LCYE-1, LCYE-4 and NCED (Supplementary Table 3); other nine transcripts were participated in anthocyanin synthesis, namely PAL-1, PAL-2, PAL-3, CHS-1, CHS-3, CHI-1, CHI-2, DFR, and 3MaT (Supplementary Table 3). In total, 18 SSR loci were found in these transcripts. Subsequently, we designed five pair of primers for each SSR site (Supplementary Table 3) and used three chrysanthemum accessions to validate the efficacy of the newly developed SSR markers. Based on the results of polyacrylamide gel electrophoresis (PAGE), we found that 8 out of 90 SSR markers showed clear polymorphic amplified products. These markers were from 7 different transcripts, including CHS-1, CHS-3, PSY, LCYE-1, LCYE-4, CHI-1 and 3MaT (Supplementary Table 3).
Estimation of genetic diversity using the newly developed SSR markers. In order to evaluate the utility of newly developed SSR markers, we used these markers to analyze the genetic diversity of 117 chrysanthemum accessions with various flower colors (Fig. 2). SSR markers can be amplified regardless of whether www.nature.com/scientificreports/ genes are expressed or not when the template was DNA; however, when the template was cDNA, SSR markers can only be amplified in the accessions that genes are expressed. Therefore, both DNA and cDNA of floral tissues were used as the templates to assess the genetic diversity of chrysanthemum accessions. The results showed that the number of alleles of all SSR markers was between 4 and 5 when the template was DNA, with an average of 4.5 alleles per locus, and between 2 and 8 alleles when the template was cDNA, with an average of 4 alleles per locus ( Table 3). The polymorphic information content (PIC) estimated by eight markers was 0.16-0.53, with an average of 0.41 when the template was DNA, and 0.06-0.70, with an average of 0.44 when the template was cDNA (Table 3). Gene diversity (expected heterozygosity: He) ranged from 0.17 to 0.60 with an average of 0.46 when DNA was used as a template; when cDNA was used as a template, it ranged from 0.06 to 0.74 with an average of 0.49 ( Table 3). The observed heterozygosity (Ho) ranged from 0.18 to 1 with an average of 0.69 when DNA was used as a template; when cDNA was used as a template, it ranged from 0 to 0.94 with an average of 0.31 (Table 3). Furthermore, the results showed that when cDNA was used as a template, the number of alleles produced by the LCYE-1 marker was the most (8), and the PIC value was 0.70 (Table 3).

Cluster analysis of 117 chrysanthemum accessions with various flower colors.
We used eight newly developed SSR markers to cluster 117 chrysanthemum accessions with diverse flower colors. Since the amplification results of SSR markers in the DNA and cDNA of the chrysanthemum flower tissues are inconsistent, the clustering analysis results are different. The results showed that the SSR marker CHS-3 or 3MaT could cluster the chrysanthemum accessions with green traits regardless of the templates being DNA or cDNA (Supplementary Fig. 2 and 4). The SSR marker CHS-1 or LCYE-4 could cluster chrysanthemum accessions with  Table 3. List of primer pairs and genetic diversity information. Na Allele of number, Ho Observed heterozygosity, He Expected heterozygosity, PIC polymorphism information content.   . 1a and 8a), while when the template was cDNA, the clustering results of them had no correlation with flower color traits (Supplementary Fig. 1b and 8b). The CHI-1 or LCYE-1 marker could cluster the chrysanthemum accessions with green traits together when the template was DNA ( Supplementary Fig. 3a and 7a), while when the template was cDNA, chrysanthemum accessions with yellow and red traits could be clustered together by CHI-1 marker and chrysanthemum accessions with green, purple and yellow traits could be clustered together by LCYE-1 marker (Supplementary Fig. 3b and 7b). When DNA was used as a template, the clustering results of the PSY-1a marker showed no correlation with chrysanthemum color traits ( Supplementary Fig. 5a), but in cDNA, chrysanthemum accessions with yellow trait were clustered together ( Supplementary Fig. 5b). The PSY-1b marker could cluster chrysanthemum accessions with yellow character when DNA was used as a template ( Supplementary Fig. 6a), while chrysanthemum accessions with purple and yellow characters could be clustered together when the template was cDNA ( Supplementary  Fig. 6b). Taken together, these results implied that the markers CHS-3, 3MaT and LCYE-1 may be associated with green color and the PSY-1b marker may be associated with yellow color in chrysanthemum accessions. The SSR markers obtained from CHS-1, CHS-3, CHI-1, and 3MaT, which are the key genes in the anthocyanin biosynthesis pathway, were integrated for cluster analysis. The results indicated that the clustering results were correlated with green traits when DNA was used as template, but these SSRs were not associated with flower color when cDNA was used as template ( Supplementary Fig. 9).
The SSR markers LCYE-1, LCYE-4, PSY-1a, and PSY-1b obtained from genes involved in the carotenoid metabolic pathway were used to classify 117 chrysanthemum accessions together. It was found that when DNA was used as template, 117 chrysanthemum accessions were divided into five clusters, but there was no obvious clustering in flower color (Fig. 3a). When the template was cDNA, 117 chrysanthemum accessions were divided into 5 groups at a genetic distance of 0.65 (Fig. 3b). There were 40 chrysanthemum accessions in category I (including 20 yellow, 11 red, 8 white, and 1 mixed-color chrysanthemum accessions) and 7 chrysanthemum accessions in category II (including 2 yellow, 3 red, and 2 white chrysanthemum accessions). There are 60 chrysanthemum accessions in group III (including 11 yellow, 6 red, 9 white, 10 purple, 2 green, and 22 mixedcolor chrysanthemum accessions), 3 chrysanthemum accessions in category IV (including 2 red and 1 white chrysanthemum accessions), and 7 chrysanthemum accessions in category V (including 2 yellow, 2 red, 1 white, 1 green, and 1 mixed-color chrysanthemum accessions). The results showed that these markers might be related to purple, yellow and multi-color characters.   34 . In this study, the third-generation transcriptome sequencing was performed to obtain the full-length transcriptome sequences in chrysanthemum cultivar 'Hechengxinghuo' . A total of 8,658,873 (7.57 Gb) clean reads were obtained including 363,653 full-length non-chimeric reads, which was similar with the number of that in P. catalpifolia 35 (349,745) but less than K. melanthera 34 (491,001). In addition, a total of 55,277 unigenes were identified with the average length of 2385 bp in chrysanthemum cultivar 'Hechengxinghuo' , while the mean unigene length was 585 bp in Chrysanthemum nankingense 14 , 727 bp in diploid Chrysanthemum indicum 36 , and 784 bp in tetraploid C. indicum 36 , each of which were sequenced using the NGS technology. These results indicated that the transcripts derived from the SMRT technology were longer in length than those derived from the NGS technology. In our work, a total of 47,436 unigenes were successfully annotated with the annotation ratio of 85.82%, which was higher than the annotation ratio of diploid C. indicum (74.60%)and tetraploid C. indicum (73.60%) obtained using the NGS technology 36 . These may be due to the long read length and high accuracy of the SMRT technology 37 . Based on the full-length transcriptome sequences, we identified 11,699 SSR loci and developed eight polymorphic SSR markers related to flower color, and the newly developed SSRs were used for genetic diversity analysis and classification of chrysanthemum accessions. To the best of our knowledge, this is the first study to develop SSR markers using the SMRT technology in Chrysanthemum. Our results demonstrated that full-length transcriptome sequencing is a reliable and effective method for SSR marker development in Chrysanthemum.

Distribution and frequency of SSRs in transcriptome. SSR molecular markers have been found to
have non-random distribution in gene regions, including CDS and UTRs 38 . The results of our study showed that most SSRs were distributed in the 5'UTR region (43.9%), followed by the 3'UTR (34.2%) and CDS regions (21.9%) ( Table 2), which was similar with other species 34 . The possible reason is that the mutation of SSRs will result in severe change of the structure and function of genes when SSRs locate in CDS regions 39 . In addition, we found that most of mono-(86.78%), di-(88.68%), tetra-(93.20%), and pentanucleotide (88.89%) repeats located in UTRs, while over half of trinucleotide (51.31%) and hexanucleotide (56.36%) repeats located in CDS regions ( Table 2). The similar results were also detected in many other species, such as coconut 40 , eggplant 41 , and castor bean 42 . The reason may be that trinucleotide and hexanucleotide repeats are less likely to cause frameshift mutations 40,42 . In terms of the SSR repeats, the most frequent repeat type was mononucleotide (66.30%), followed by trinucleotide (22.91%) and dinucleotide (9.29%) repeats, which was similar with the result in Populus wulianensis 10 . However, the most abundant repeat motif was trinucleotide in coconut 40 , eggplant 41 , castor bean 42 , and Sugarcane 12 . Additionally, we found A/T was the dominant mononucleotide repeat in chrysanthemum, which was consistent with the eukaryotes 43 . However, the AC/GT repeats were the most abundant dinucleotide motifs in our work, which was different from most other plants in which the AG/CT repeats were the most common dinucleotide repeats 40 . These differences in frequency of SSR repeats may be attributed to the differences in species, development tools, or SSR searching criteria 10 .
Development and application of SSR markers associated with flower color. SSRs, as one of the most valuable molecular markers, have been widely applied in identification and classification of cultivars 44 , assessment of genetic diversity 45,46 , exploration of genetic relationship and intraspecific genetic divergence 3,36 . However, few reports are available on the development of trait-associated SSR markers. Xia et al. developed 191 polymorphic SSR markers based on the transcriptome sequences and found nine SSR markers significantly associated with plant height through association analysis in coconut 40 . In this study, we developed eight polymorphic SSR markers associated with flower color based on seven transcripts (Table 3), which derived from five genes involved in carotenoid metabolism or anthocyanin synthesis, including CHS, CHI, 3MaT, LCYE and PSY. Among them, CHS and CHI are two key genes in the anthocyanin biosynthetic pathway. The pigment content of radish fleshy root was highly correlated with the expression level of CHS gene 47 , and CHI played an essential role in seed, fruit, and flower color formation 22,44 . 3MaT, encoding an anthocyanin malonyltransferase, has been reported to be closely related to the content and stability of anthocyanin in dahlia flowers 28,29 . Additionally, LCYE and PSY are two key genes in the carotenoid metabolic pathway 13 . The mutant of OsLCYE gene had higher carotenoid content than wild-type plants under salt stress 30 . PSY catalyzes the first step of the carotenoid biosynthetic pathway and is an important rate-limiting enzyme of carotenoid biosynthesis 48 . Furthermore, it has been reported that anthocyanin and carotenoids are main pigments of flower color 22,23 . Therefore, the eight newly developed SSR markers in this study are likely to be related to flower color.
The PIC values represent the informativeness of molecular markers, which was categorized as low (PIC < 0.25), moderate (0.5 < PIC < 0.25), and high (PIC > 0.5), respectively 49 . In this study, we used eight newly developed SSR markers to evaluate the genetic diversity of 117 chrysanthemum accessions at DNA and cDNA levels. These SSRs exhibited moderate PIC values ranged from 0.16 to 0.53 with an average of 0.41 and from 0.06 to 0.70 with an average of 0.44 at DNA and cDNA levels (Table 3), respectively, which were lower than that of chrysanthemum  50 . In addition, we found the PIC value at cDNA level was higher than that of DNA level (Table 3), implying that the SSRs presented higher levels of informativeness at the transcriptional level. In addition, These results indicated that the newly developed SSR markers in our work had the potential for further genetic study in chrysanthemum and its relatives. In recent years, SSR markers have been reported to be widely used for classification in Chrysanthemum. Zhang et al. used SSR molecular markers to identify and classify 480 Chinese traditional ornamental chrysanthemum cultivars 44 . Feng et al. used SSR markers to analyze the phylogenetic relationship of 32 medicinal chrysanthemum cultivars and found they were divided into two group and group I included all the "Machengju" and "Hangju" samples 45 . Olejnik et al. used 14 polymorphic SSRs to classify 97 chrysanthemum cultivars 46 . The results showed that all the cultivars were divided into four clusters and the first cluster contained only small-flowered accessions. In this study, we used the eight newly developed SSR markers to cluster 117 chrysanthemum accessions with diverse flower colors. We found that four SSR markers, LCYE-1, LCYE-4, PSY-1a, and PSY-1b, divided 117 chrysanthemum accessions into five groups at DNA and cDNA level, but there was no obvious clustering of flower color in chrysanthemums at the DNA level. At cDNA level, all purple chrysanthemum accessions were in the group III (Fig. 3b), implying that these four SSR markers may be correlated with purple color. Furthermore, the results of cluster analysis implied CHS-3, 3MaT and LCYE-1 markers may be related to green color and PSY-1b marker may be related to yellow color ( Supplementary Fig. 2, 4, 6 and 7).
Taken together, our work is a new attempt to develop SSR molecular markers related to specific trait based on the full-length transcriptome sequences, and will lay a solid foundation for genetic diversity analysis, classification and molecular-assisted breeding in Chrysanthemum.  Table 1).

Materials and methods
RNA extraction, SMRTbell library preparation, and transcriptome sequencing. Total RNA was extracted from the flowers of the chrysanthemum cultivar 'Hechengxinghuo' using the RNAprep Pure Plant Plus Kit (Tiangen, Beijing, China) following the manufacturer's protocol. The integrity of the RNA was then tested using an Agilent 2100 instrument. The cDNA was synthesized using a SMARTer® PCR cDNA Synthesis Kit (Roche, Switzerland). To construct the SMRTbell library, PCR amplification was performed with KAPA HiFi PCR Kits (Roche, Switzerland) and the amplified products were used to generate the SMRTbell library with the SMRTbell template prep kit 1.0. After library construction, a certain concentration and volume of library template and enzyme complex were transferred to the nanopore of PacBio Sequel sequencer to start real-time single-molecule sequencing (Nextomics Biosciences, Wuhan, China).
Analysis of transcriptome data and microsatellite mining. After sequencing, high-quality transcriptome data was obtained by filtering and the clean data was processed using PacBio SMRT Link version 5.1. To obtain annotation information of the transcripts, the non-redundant transcript sequences obtained were aligned to the NR, SwissProt, GO, COG, KOG and KEGG databases using BLAST software (version 2.2.26)31. The full-length consensus sequence was used for subsequent analysis. Potential SSRs included in transcript sequences were searched and analyzed using MISA32 with default parameters. The acquired SSRs were containing basic motifs with mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats.
Chrysanthemum SSR primer design. Transcripts related to flower color formation and regulation were selected to develop SSR markers. Oligonucleotide primers were designed according to the flanking sequences of the SSRs using the Primer 3.0 software. Potential SSR markers were selected according to the following parameters: primer length between 18 and 22 bp, PCR product length between 100-300 bp, primer melting temperature (Tm) between 55 and 65 °C and GC content of 40-60%.
DNA and RNA extraction from floral materials of 117 chrysanthemum accessions. Total DNA was extracted using the Plant Genomic DNA Extraction Kit (Sangon Biotech, Shanghai, China) and the RNA was extracted using the Spin Column Plant Total RNA Purification Kit (Sangon Biotech) following the manufacturer's protocol. The RNA (1.0 μg) was reverse-transcribed into cDNA using the PrimeScript RT reagent Kit (Takara Bio, Tokyo, Japan) following the manufacturer's protocol. The DNA and cDNA concentration of each sample was tested using a NanoDrop2000 spec-trophotometer (Thermo Fisher Scientific, USA) and adjusted to 20 ng/μl in the final. PCR  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.